Search CORE

1,779 research outputs found

Knowledge Graph Completion to Predict Polypharmacy Side Effects

Author: AM Manicone
D Sridhar
D Szklarczyk
DA Fishman
F Cheng
GE Hinton
HG Munshi
M Kuhn
M Zitnik
N. P. Tatonetti
W Zhang
Publication venue
Publication date: 22/10/2018
Field of study

The polypharmacy side effect prediction problem considers cases in which two drugs taken individually do not result in a particular side effect; however, when the two drugs are taken in combination, the side effect manifests. In this work, we demonstrate that multi-relational knowledge graph completion achieves state-of-the-art results on the polypharmacy side effect prediction problem. Empirical results show that our approach is particularly effective when the protein targets of the drugs are well-characterized. In contrast to prior work, our approach provides more interpretable predictions and hypotheses for wet lab validation.Comment: 13th International Conference on Data Integration in the Life Sciences (DILS2018

arXiv.org e-Print Archive

Crossref

STITCH 4: integration of protein-chemical interactions with user data

Author: Blicher T.H.
Bork P.
Jensen L.J.
Kuhn M.
Pletscher-Frankild S.
Szklarczyk D.
von Mering C.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2014
Field of study

STITCH is a database of protein-chemical interactions that integrates many sources of experimental and manually curated evidence with text-mining information and interaction predictions. Available at http://stitch.embl.de, the resulting interaction network includes 390 000 chemicals and 3.6 million proteins from 1133 organisms. Compared with the previous version, the number of high-confidence protein-chemical interactions in human has increased by 45%, to 367 000. In this version, we added features for users to upload their own data to STITCH in the form of internal identifiers, chemical structures or quantitative data. For example, a user can now upload a spreadsheet with screening hits to easily check which interactions are already known. To increase the coverage of STITCH, we expanded the text mining to include full-text articles and added a prediction method based on chemical structures. We further changed our scheme for transferring interactions between species to rely on orthology rather than protein similarity. This improves the performance within protein families, where scores are now transferred only to orthologous proteins, but not to paralogous proteins. STITCH can be accessed with a web-interface, an API and downloadable files

MDC Repository

STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets

Author: Bork P.
Doncheva N.T.
Gable A.L.
Huerta-Cepas J.
Jensen L.J.
Junge A.
Lyon D.
Morris J.H.
Simonovic M.
Szklarczyk D.
von Mering C.
Wyder S.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 08/01/2019
Field of study

Proteins and their functional interactions form the backbone of the cellular machinery. Their connectivity network needs to be considered for the full understanding of biological phenomena, but the available information on protein-protein associations is incomplete and exhibits varying levels of annotation granularity and reliability. The STRING database aims to collect, score and integrate all publicly available sources of protein-protein interaction information, and to complement these with computational predictions. Its goal is to achieve a comprehensive and objective global network, including direct (physical) as well as indirect (functional) interactions. The latest version of STRING (11.0) more than doubles the number of organisms it covers, to 5090. The most important new feature is an option to upload entire, genome-wide datasets as input, allowing users to visualize subsets as interaction networks and to perform gene-set enrichment analysis on the entire input. For the enrichment analysis, STRING implements well-known classification systems such as Gene Ontology and KEGG, but also offers additional, new classification systems based on high-throughput text-mining as well as on a hierarchical clustering of the association network itself. The STRING resource is available online at https://string-db.org/

MDC Repository

eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses

Author: Bork P.
Cook H.
Forslund S.K.
Heller D.
Hernández-Plaza A.
Huerta-Cepas J.
Jensen L.J.
Letunic I.
Mende D.R.
Rattei T.
Szklarczyk D.
von Mering C.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 08/01/2019
Field of study

eggNOG is a public database of orthology relationships, gene evolutionary histories and functional annotations. Here, we present version 5.0, featuring a major update of the underlying genome sets, which have been expanded to 4445 representative bacteria and 168 archaea derived from 25 038 genomes, as well as 477 eukaryotic organisms and 2502 viral proteomes that were selected for diversity and filtered by genome quality. In total, 4.4M orthologous groups (OGs) distributed across 379 taxonomic levels were computed together with their associated sequence alignments, phylogenies, HMM models and functional descriptors. Precomputed evolutionary analysis provides fine-grained resolution of duplication/speciation events within each OG. Our benchmarks show that, despite doubling the amount of genomes, the quality of orthology assignments and functional annotations (80% coverage) has persisted without significant changes across this update. Finally, we improved eggNOG online services for fast functional annotation and orthology prediction of custom genomics or metagenomics datasets. All precomputed data are publicly available for downloading or via API queries at http://eggnog.embl.de

MDC Repository

STITCH 3: zooming in on protein–chemical interactions

Author: A. Franceschini
Berman
C. von Mering
Chen
D. Szklarczyk
Jensen
Kalinina
Kapitzky
Kuhn
L. J. Jensen
M. Kuhn
Okuno
P. Bork
Rognan
Roth
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

To facilitate the study of interactions between proteins and chemicals, we have created STITCH, an aggregated database of interactions connecting over 300 000 chemicals and 2.6 million proteins from 1133 organisms. Compared to the previous version, the number of chemicals with interactions and the number of high-confidence interactions both increase 4-fold. The database can be accessed interactively through a web interface, displaying interactions in an integrated network view. It is also available for computational studies through downloadable files and an API. As an extension in the current version, we offer the option to switch between two levels of detail, namely whether stereoisomers of a given compound are shown as a merged entity or as separate entities. Separate display of stereoisomers is necessary, for example, for carbohydrates and chiral drugs. Combining the isomers increases the coverage, as interaction databases and publications found through text mining will often refer to compounds without specifying the stereoisomer. The database is accessible at http://stitch.embl.de/

CiteSeerX

Crossref

PubMed Central

Copenhagen University Research Information System

ZORA

MDC Repository

Stretching and twisting of the DNA duplexes in coarse grained dynamical models

Author: Albert B
Allen M P
Kwiecińska J I
Marek Cieplak
Marenduzzo D
Smith S B
Sułkowska J I
Szklarczyk O
Szymon Niewieczerzał
Weber C
Publication venue: 'IOP Publishing'
Publication date: 31/12/2008
Field of study

Three coarse-grained models of the double-stranded DNA are proposed and compared in the context of mechanical manipulation such as twisting and various schemes of stretching. The models differ in the number of effective beads (between two and five) representing each nucleotide. They all show similar behavior and, in particular, lead to a torque-force phase diagrams qualitatively consistent with experiments and all-atom simulations

arXiv.org e-Print Archive

Crossref

STRING v10: protein-protein interaction networks, integrated over the tree of life

Author: Bork P.
Forslund K.
Franceschini A.
Heller D.
Huerta-Cepas J.
Jensen L.J.
Kuhn M.
Roth A.
Santos A.
Simonovic M.
Szklarczyk D.
Tsafou K.P.
von Mering C.
Wyder S.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2015
Field of study

The many functional partnerships and interactions that occur between proteins are at the core of cellular processing and their systematic characterization helps to provide context in molecular systems biology. However, known and predicted interactions are scattered over multiple resources, and the available data exhibit notable differences in terms of quality and completeness. The STRING database (http://string-db.org) aims to provide a critical assessment and integration of protein-protein interactions, including direct (physical) as well as indirect (functional) associations. The new version 10.0 of STRING covers more than 2000 organisms, which has necessitated novel, scalable algorithms for transferring interaction information between organisms. For this purpose, we have introduced hierarchical and self-consistent orthology annotations for all interacting proteins, grouping the proteins into families at various levels of phylogenetic resolution. Further improvements in version 10.0 include a completely redesigned prediction pipeline for inferring protein-protein associations from co-expression data, an API interface for the R computing environment and improved statistical analysis for enrichment tests in user-provided networks

MDC Repository

Gene expression drives the evolution of dominance.

Author: A Durvasula
A Platt
AF Agrawal
B Charlesworth
BM Henn
BY Kim
CD Huber
CD Huber
D Enard
D Ortega-Del Vecchyo
D Szklarczyk
DJ Balick
F Gao
F Manna
FH Shaw
H Kacser
HA Orr
I Frumkin
J Yang
JBS Haldane
JS Sanjak
KE Lohmueller
KM Teshima
LD Hurst
MA DePristo
MJ Simmons
N Phadnis
P Cingolani
P Lamesch
PY Novikova
RA Fisher
RD Hernandez
RN Gutenkunst
S Glémin
S Ossowski
S Williamson
S Wright
SH Williamson
T Bedford
T Kawakatsu
T Mukai
TI Gossmann
TT Hu
X Zheng
YB Simons
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

Dominance is a fundamental concept in molecular genetics and has implications for understanding patterns of genetic variation, evolution, and complex traits. However, despite its importance, the degree of dominance in natural populations is poorly quantified. Here, we leverage multiple mating systems in natural populations of Arabidopsis to co-estimate the distribution of fitness effects and dominance coefficients of new amino acid changing mutations. We find that more deleterious mutations are more likely to be recessive than less deleterious mutations. Further, this pattern holds across gene categories, but varies with the connectivity and expression patterns of genes. Our work argues that dominance arises as a consequence of the functional importance of genes and their optimal expression levels

Crossref

eScholarship - University of California

A strategy to incorporate prior knowledge into correlation network cutoff selection

Author: A Fabregat
A-L Barabási
AK Rider
B Pei
B Zhang
BW Matthews
CE Shannon
D Croft
D Szklarczyk
D Szklarczyk
D Szklarczyk
DMW Powers
E Benedetti
F Dieterle
G Altay
G Camilli
G Sales
G Sales
H Carter
I Rudan
J Krumsiek
J Krumsiek
J Krumsiek
J Linde
J Schafer
J Schäfer
JE Huffman
JN Weinstein
K Baba
KA Hoadley
KT Do
M Ante
M Balbin
M Giurgiu
MHJ Selman
N Swainston
OJ Dunn
P Langfelder
R Albert
R Jefferis
R Tibshirani
S Hammoudeh
S Kim
V Stavrakas
Y Benjamini
Y Li
Y Yang
Y Zuo
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Correlation networks are frequently used to statistically extract biological interactions between omics markers. Network edge selection is typically based on the statistical significance of the correlation coefficients. This procedure, however, is not guaranteed to capture biological mechanisms. We here propose an alternative approach for network reconstruction: a cutoff selection algorithm that maximizes the overlap of the inferred network with available prior knowledge. We first evaluate the approach on IgG glycomics data, for which the biochemical pathway is known and well-characterized. Importantly, even in the case of incomplete or incorrect prior knowledge, the optimal network is close to the true optimum. We then demonstrate the generalizability of the approach with applications to untargeted metabolomics and transcriptomics data. For the transcriptomics case, we demonstrate that the optimized network is superior to statistical networks in systematically retrieving interactions that were not included in the biological reference used for optimization

Crossref

Directory of Open Access Journals

Edinburgh Research Explorer

PuSH

MPG.PuRe